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When  reading  our  reports,  one  should  note  that  three  distinct  parallel 
processors  are  discussed:  the  "ultracomputer",  a  message  passing  machine; 
the  "paracomputer",  an  idealized  shared  memory  machine;  and  the  "NYU 
Ultracomputer",  a  network-based  shared  memory  machine  that  approximates 
a  paracomputer.  Moreover,  since  the  NYU  Ultracomputer  has  replaced  the 
original  ultracomputer  as  our  preferred  design,  our  later  reports  often 
"abbreviate"  NYU  Ultracomputer  as  ultracomputer.  Finally,  our  basic 
coordination  primitive  has  been  called  NEWVAL,  replace-add,  and  fetch- 
and-add.  We  realize  that  such  highly  context  sensitive  language  may  be 
confusing  and  this  guide  is  a  (long  overdue)  attempt  to  help  clarify  the 
situation.  In  addition,  by  presenting  a  historical  overview  of  the  project  I 
hope  to  group  together  those  reports  reflecting  work  done  at  roughly  the 
same  time  (and  hence  using  roughly  the  same  terminology).  Unfortunately, 
the  report  numbers  and  dates  reflect  only  the  order  in  which  the  work  was 
written  up  in  final  form.  Since  the  present  note  contains  no  new  technical 
material,  I  hope  that  you  will  permit  an  informal  exposition. 

During  the  academic  year  1978-1979,  Jack  Schwartz  wrote  the  original 
paper  on  ultracomputers  [UC].  I  was  attending  NYU  graph  theory  seminars 
and  helped  Clyde  Kruskal  proofread  [UC].  This  important  paper  proposed 
two  models  of  parallel  computation.  The  primary  model,  which  Jack  called 
the  ultracomputer,  was  a  message  passing  machine  in  which  each  processor- 
with-memory  was  connected  to  4  others  -  the  key  connection  being  the 
perfect  shuffle.  In  addition,  he  proposed  an  idealized  model,  the 
paracomputer,  in  which  a  large  central  memory  could  be  simultaneously 
accessed  in  a  single  cycle  by  multiple  independent  processors.  Although  fan- 
in  limitations  would  prevent  this  latter  model  from  being  realized,  it  has 
served  as  a  standard  against  which  others  are  compared.  Soon  Malcolm 
Harrison  and  Larry  Rudolph  become  interested  in  Jack's  paper  and  the 
ultracomputer  group  was  unofficially  bom. 


As  luck  would  have  it,  the  next  year  1979-1980  was  my  sabbatical  year 
from  CUNY  so  I  went  to  NYU  to  continue  studying  graph  theory.  However, 
the  lure  of  an  embryonic  project  on  a  fascinating  subject  was  too  strong.  I 
switched  to  parallel  processing.  All  the  work  from  that  early  period 
concerned  the  ultracomputer  model.  Lambert  Mertecns,  visiting  Courant 
from  the  Mathematics  Centnmi,  presented  a  stepwise  refinement  derivation 
of  parallel  bitonic  sort  [UCNl]  and  showed  that  large  ultracomputers  could 
not  be  built  from  small  ones  [UCN2].  Jack  proposed  one  programming 
language  [UCN3]  and  I  wrote  a  simulator  for  another  [UCNIO]  and  [UCN14], 
which  Clyde  and  I  subsequently  extended  [UCN15].  We  also  wrote  [UCNll], 
which  extended  the  applicability  of  algorithms  in  [UC]  and  showed  that 
permutations  are  hard  to  parallelize.  The  latter  result  is  to  appear  as 
[PERM]  and  Clyde  eventually  parlayed  this  study  into  his  thesis  [CT]). 
Clyde  and  Larry  [UCN6]  developed  an  idea  of  Malcolm  Harrison  and  Mai 
Kalos  -  multidimensional  ultracomputers.  Charles  Peskin  and  Olof  Widlund 
investigated  scientific  programming  on  the  ultracomputer  [UCN18], 
[UCN19],  &  [UCN20].  There  was  also  a  (short  lived)  flurry  of  activity  in 
constructing  "nearly  planar"  shuffle-exchange  interconnections  [UCN4], 
[UCN8],  &  [UCN9]. 

At  this  point  the  project  was  strongly  influenced  by  an  event  that 
occurred  outside  NYU.  Burroughs  Corporation  proposed  an  MIMD  omega- 
network-based  shared  memory  machine  for  the  NASA  Ames  "digital  wind 
tunnel"  (NASF).  We  read  their  phase  one  proposal  and  traveled  to  Paoli  PA 
to  exchange  ideas.  Although  we  had  hoped  to  have  our  work  influence  this 
potentially  important  commercial  machine,  in  retrospect  I  believe  that  they 
affected  us  more  than  we  affected  them. 

Jack,  realizing  that  such  a  network-based  machine  could  provide  a 
realizable  approximation  to  the  paracomputer,  wrote  [UCN5],  which 
provided  a  theoretical  explanation  of  why  the  Burroughs  machine  attained 
good  average  case  performance  for  data  permutations  and  Qyde  and  I 
resurrected  an  ultracomputer  permutation  algorithm  with  good  average  case 
performance  [UCN7]  that  we  had  discarded  while  working  on  [UCNll]  (since 
its  worst  case  performance  is  poor).  The  Burroughs  design  stimulated  our 
interest  in  paracomputers  both  as  theoretical  models  and  as  ideal  machines 
toward  which  pragmatic  machine  designs  should  aim;  for  example,  Clyde 
adapted  the  ultracomputer  algorithms  of  [UCNll]  to  paracomputers  [UCN26] 
(another  part  of  [CT]). 

About  a  dozen  years  earlier  Ralph  Grishman  had  written  a  software 
simulator  of  a  machine  quite  similar  to  the  paracomputer.  This  simulator, 
dubbed  SOAPSUDS,  was  used  for  some  early  parallel  processing  studies  at 
NYU  [BUI]  &  [BU2]  around  1970.  After  extensive  conversations  with  Ralph 
(who  subsequently  became  our  first  hardware  designer),  I  wrote  a 
paracomputer  simulator  called  WASHCLOTH  [UCN12]  &  [UCN21],  which 
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semantics  of  the  newly  implemented  operation  were  (trivially)  different  from 
those  of  the  previous  one,  I  eventually  agreed  with  Mai  =jid  Qydc  that  wc 
should  switch  to  the  (more  accurate)  name,  "fetch-and-add". 

The  new  year  1981-1982  brings  us  Pat  Teller,  Jim  Lipkis,  George 
Holober,  and  Jim  Wilson  as  well  as  a  strong  desire  to  construct  a  prototype 
(NYU)  Ultracomputer. 
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has  since  been  heavily  used  for  sdentific  programming  [UCN22],  [UCN23], 
[UCN24],  [UCN27],  [UCN30],  [UCN31]  (David  Korn,  on  leave  from  BcU 
Labs  1980-81  did  systems  work  on  WASHCLOTH  as  weU  as  scientific 
programming;  Gabi  Leshem  was  a  graduate  student  working  on  the  project; 
and  Norman  Rushfield  helped  us  parallelize  a  large  NASA  "weather  code"). 
I  believe  that  the  continual  reliance  on  realistic  scientific  applications  to  test 
our  theories  has  been  a  significant  strength  of  the  project. 

The  1970  studies  used  a  single  "NEWVAL"  primitive  to  coordinate  the 
processors.  Although  NEWVAL  was  supported  by  SOAPSUDS,  no  one  then 
had  any  idea  how  to  implement  it  in  a  nonserial  manner.  We  also  used  this 
primitive  (but  called  it  replace-add)  in  WASHCLOTH  and  hoped  to  find  an 
efficient  implementation  on  some  paracomputer  {^proximation  like  the 
Burroughs  machine.  Larry  found  a  way  to  do  this  by  enhancing  the  switches 
in  the  Burroughs  design.  Moreover,  Larry,  Boris  Lubachevsky,  and  I  found 
fully  parallel  algorithms  for  queue  insertion  and  deletion.  These  last  two 
discoveries,  plus  additional  fully  parallel  software  primitives,  were  the  subject 
of  [UCN16]  and  resulted  in  our  current  emphasis  on  network-based  designs 
since  we  came  to  realize  that  most  other  architectures  do  not  permit  such 
fully  parallel  algorithms  (for  example  see  [UCN17]  which  has  appeared  as 
[TC]).  An  abbreviated  version  of  [UCN16]  appeared  as  [PP],  essentially  the 
entire  work  is  to  appear  as  [TOPLAS],  and  Larry  parlayed  it  into  [LT]. 
After  writing  a  program  to  check  several  algorithms  in  [UCN16],  Boris 
became  interested  in  automatic  verification  of  parallel  programs,  eventually 
producing  [UCN33]. 

This  past  year  1980-81  began  with  the  arrival  of  Marc  Snir  and  saw 
Kevin  McAuliffe  become  actively  involved.  As  the  year  closed  the  entire 
group  (expanded  by  the  mid-year  arrival  of  Shaula  Yemini)  realized  that  we 
should  seriously  consider  building  a  prototype.  During  that  year  Marc 
analyzed  general  connection  networks  [UCN29]  and  produced  a  modified 
WASHCLOTH  that  introduced  delays  based  on  his  queuing  theory 
approximation  of  our  proposed  network  [UCN28];  Boris  surveyed  the 
Russian  literature  in  parsJlel  processing  [UCN35];  Jack  and  I  summarized  the 
group's  network-based  work  [UCN25]  which  appeared  as  [C];  and  I  wrote  a 
toy  parallel  operating  system  MOP  [UCN13]  on  top  of  WASHCLOTH  to  test 
several  software  primitives  in  [UCN16].  By  then  we  had  a  definite 
macroscopic  machine  design  as  well  as  a  detailed  design  for  selected 
components  (e.g.  the  switches).  We  presented  this  design  in  [UCN32]  (a 
nearly  final  version  of  which  appeared  as  [SD])  and  named  it  the  NYU 
Ultracomputer  on  the  grounds  that  what  else  could  you  call  a  machine 
designed  by  the  NYU  Ultracomputer  project  (definition  "Mathematics"-  the 
subject  matter  studied  by  mathematicians).  Finally,  Qyde  and  I  foimd  a  new 
parallel  implementation  of  replace-add  [UCN34].  This  implementation  also 
relied  on  enhanced  switches;  in  fact  it  was  quite  similar  to  the  [UCN16] 
implementation  (but  generalized  to  other  important  operations).    Since  the 
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